71 research outputs found

    Transcript expression-aware annotation improves rare variant interpretation

    Get PDF
    The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)(1), we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project(2) and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.Peer reviewe

    Gene family information facilitates variant interpretation and identification of disease-associated genes in neurodevelopmental disorders

    Get PDF
    Abstract Background Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs). While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes belong to gene families. The use of gene family information for disease gene discovery and variant interpretation has not yet been investigated on a genome-wide scale. We empirically evaluate whether paralog-conserved or non-conserved sites in human gene families are important in NDDs. Methods Gene family information was collected from Ensembl. Paralog-conserved sites were defined based on paralog sequence alignments; 10,068 NDD patients and 2078 controls were statistically evaluated for de novo variant burden in gene families. Results We demonstrate that disease-associated missense variants are enriched at paralog-conserved sites across all disease groups and inheritance models tested. We developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in NDD patients of which 28 represent novel candidate genes for NDD which are brain expressed and under evolutionary constraint. Conclusion This study represents the first method to incorporate gene family information into a statistical framework to interpret variant data for NDDs and to discover new NDD-associated genes

    Genetic risk for autism spectrum disorders and neuropsychiatric variation in the general population

    Get PDF
    Almost all genetic risk factors for autism spectrum disorders (ASDs) can be found in the general population, but the effects of that risk are unclear in people not ascertained for neuropsychiatric symptoms. Using several large ASD consortia and population based resources, we find genetic links between ASDs and typical variation in social behavior and adaptive functioning. This finding is evidenced through both inherited and de novo variation, indicating that multiple types of genetic risk for ASDs influence a continuum of behavioral and developmental traits, the severe tail of which can result in an ASD or other neuropsychiatric disorder diagnosis. A continuum model should inform the design and interpretation of studies of neuropsychiatric disease biology

    Whole-genome sequencing reveals host factors underlying critical COVID-19

    Get PDF
    Critical Covid-19 is caused by immune-mediated inflammatory lung injury. Host genetic variation influences the development of illness requiring critical care1 or hospitalisation2-4 following SARS-CoV-2 infection. The GenOMICC (Genetics of Mortality in Critical Care) study enables the comparison of genomes from critically-ill cases with population controls in order to find underlying disease mechanisms. Here, we use whole genome sequencing in 7,491 critically-ill cases compared with 48,400 controls to discover and replicate 23 independent variants that significantly predispose to critical Covid-19. We identify 16 new independent associations, including variants within genes involved in interferon signalling (IL10RB, PLSCR1), leucocyte differentiation (BCL11A), and blood type antigen secretor status (FUT2). Using transcriptome-wide association and colocalisation to infer the effect of gene expression on disease severity, we find evidence implicating multiple genes, including reduced expression of a membrane flippase (ATP11A), and increased mucin expression (MUC1), in critical disease. Mendelian randomisation provides evidence in support of causal roles for myeloid cell adhesion molecules (SELE, ICAM5, CD209) and coagulation factor F8, all of which are potentially druggable targets. Our results are broadly consistent with a multi-component model of Covid-19 pathophysiology, in which at least two distinct mechanisms can predispose to life-threatening disease: failure to control viral replication, or an enhanced tendency towards pulmonary inflammation and intravascular coagulation. We show that comparison between critically-ill cases and population controls is highly efficient for detection of therapeutically-relevant mechanisms of disease

    De novo Variants in Neurodevelopmental Disorders with Epilepsy

    Get PDF
    Neurodevelopmental disorders (NDD) with epilepsy constitute a complex and heterogeneous phenotypic spectrum of largely unclear genetic architecture. We conducted exome-wide enrichment analyses for protein-altering de novo variants (DNV) in 7088 parent-offspring trios with NDD of which 2151 were comorbid with epilepsy. In this cohort, the genetic spectrum of epileptic encephalopathy (EE) and nonspecific NDD with epilepsy were markedly similar. We identified 33 genes significantly enriched for DNV in NDD with epilepsy, of which 27.3 were associated with therapeutic consequences. These 33 DNV-enriched genes were more often associated with synaptic transmission but less with chromatin modification when compared to NDD without epilepsy. On average, only 53 of the DNV-enriched genes were represented on available diagnostic sequencing panels, so our findings should drive significant improvements of genetic testing approaches

    Analysis of protein-coding genetic variation in 60,706 humans

    Get PDF
    Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. We describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of truncating variants with 72% having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human “knockout” variants in protein-coding genes

    Exome-wide association study to identify rare variants influencing COVID-19 outcomes: Results from the Host Genetics Initiative

    Get PDF
    corecore